Rank | Count | Beginning |
---|---|---|
13370 | 1215 | Kwa |
19723 | 1148 | Mwaka |
15975 | 818 | Makala |
8904 | 779 | Katika |
11733 | 682 | Kufuatana |
27739 | 448 | Wakati |
1921 | 346 | Baada |
5013 | 346 | Hata |
12671 | 290 | Kuna |
14699 | 276 | Lakini |
22714 | 274 | Pia |
18515 | 248 | Mji |
21794 | 233 | Ni |
5612 | 218 | Hii |
7960 | 218 | Jina |
6625 | 196 | Idadi |
24808 | 176 | Tangu |
5875 | 174 | Historia |
25030 | 163 | Tarehe |
6090 | 157 | Hivyo |
8512 | 152 | Kama |
18945 | 148 | Mnamo |
29023 | 142 | Watu |
29649 | 137 | Yeye |
17082 | 130 | Mara |
2392 | 113 | Baadhi |
23696 | 113 | Sehemu |
2269 | 111 | Baadaye |
8885 | 105 | Kati |
3808 | 103 | Eneo |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV